Data assimilation is a notoriously complex operational problem, which involves interdependent steps of:
large volumes of NWP data to perform the task.
From Christopher Harrop:
Workflow Management is a concept that originated in the 1970’s to handle business process management. Workflow management systems were developed to manage complex collections of business processes that need to be carried out in a certain way with complex interdependencies and requirements.
…scientific workflows are driven by the scientific data that “flows” through them… usually triggered by the availability of some kind of input data, and a task’s result is usually some kind of data that is fed as input to another task in the workflow.
The complexity of data assimilation cycling is demonstrated in the following data flow diagram…
My training and method of analyzing the data assimilation problem is from the framework of a statistical learning problem.
In order to perform my research, I need to run many simulations to study:
My re-forecasting workflows are also non-standard from the perspective of operational forecasting;
I also perform simulations on multiple HPC platforms with different system architectures, job schedulers and software stacks, so I need to keep my software as portable and system-agnostic as possible.
These demands in my research have led me to develop an end-to-end data assimilation cycling system in the GSI-WRF-MET stack using the Rocoto Workflow Manager.
This currently includes a user-facing IPython API for Rocoto workflow commands, and plotting results in Matplotlib.
The GSI-WRF-Cycling-Template and MET-tools code repositories are licensed for reuse, redistribution and modification under the Apache 2.0 Open Source License.
NOTE: there is no documentation (outside of comments) as the current version available is to be replaced with a new version ASAP, built on the Cylc workflow system.